CONSCIOUS AND 
UNCONSCIOUS 
MENTALITY 


Examining their Nature, Similarities 
and Differences 


Edited by Juraj Hvorecký, Tomáš Marvan and 
Michal Polák 

First published 2024 

ISBN: 978-1-032-52979-0 (hbk) 


ISBN: 978-1-032-52974-5 (pbk) 
ISBN: 978-1-003-40952-6 (ebk) 


14 


TEMPLATE TUNING AND GRADED 
CONSCIOUSNESS 


Berit Brogaard and Thomas Alrik Sørensen 


(CC-BY-NC-ND) 4.0 
DOI: 10.4324/9781003409526-18 


This Chapter was funded by Aalborg University and Thomas Alrik Sorensen via 
funding from the Sino-Danish Center for Education and Research 


£} Routledge 
3 Taylor & Francis Group 
LONDON AND NEW YORK 


14 


TEMPLATE TUNING AND GRADED 
CONSCIOUSNESS 


Berit Brogaard and Thomas Alrik Sørensen 


14.1 Introduction 


Whether visual perceptual consciousness is gradable or dichotomous has 
been the subject of fierce debate in recent years (e.g., Sergent and Dehaene 
2004; Eiserbeck et al. 2022). To see what is at stake, it will be helpful to 
introduce Ned Block’s (1995; 2005; 2007) distinction between access 
consciousness and phenomenal consciousness. According to this distinction, 
a perceptual state is phenomenally conscious (or P-conscious) when the 
perceiver is subjectively (or phenomenally) aware of what is represented 
by the state. By contrast, a perceptual state is access-conscious (or A- 
conscious) when its representational content is accessible to the perceiver 
for post-perceptual tasks such as verbal reports, reasoning, and action 
planning. It is generally agreed that, for the content of a perceptual state 
to be accessible to the perceiver it must be represented in working memory 
(Baars 1997; Baddeley 2012), or in another perceptual memory store like 
visual short-term memory (Sørensen and Kyllingsbek 2012). In spite of 
the fact that P- and A-consciousness are distinct conceptual constructs, the 
received view in philosophy and psychology is that they actually co-occur 
and thus are important characteristics of the same phenomenon (Brogaard 
2011a; 2011b). Assuming the latter view, the hypothesis that perceptual 
consciousness is dichotomous holds that a perceiver has full access to 
(i.e., is A-conscious of), and thus is fully phenomenally aware of (i.e., is P- 
conscious of) perceptual information that is represented in some memory 
store (e.g., visual). By contrast, the hypothesis that perceptual consciousness 
is gradable holds that a perceiver may have less than full access to—and thus 
be less than fully phenomenally aware of—perceptual information that is 
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represented in working memory. This raises a question: In virtue of what can 
a subject be less than fully A- and P-conscious of perceptual information? 
In this chapter, we provide an answer to this question, according to which 
inexact categorizations of visual input may result in a representation of 
the visual information in working memory that is less than fully available 
to the perceiver, and of which the perceiver therefore is less than fully 
phenomenally aware. The latter proposal is a natural extension of a theory of 
perception we have proposed in previous works, namely, the template tuning 
theory (TTT; Brogaard and Sørensen 2023, in press a, b). We argue that 
TTT is compatible with both a gradable and a dichotomous conception of 
perceptual consciousness, but suggest that the available empirical evidence 
favours the view that perceptual consciousness is a graded phenomenon. 


14.2 Graded consciousness 


The question arises as to what we might mean by “degraded consciousness.” 
What we are interested in here is the A- and P-consciousness associated with 
(visual) perceptual experience, or perceptual A- and P-consciousness for short. 
On a common conception of perceptual consciousness, degraded perceptual 
P-consciousness is associated with reduced visibility or visual clarity of what 
is perceptually represented, whereas degraded A-consciousness is associated 
with reduced cognitive access to the perceptual information available for visual 
working memory (VWM) tasks (Brogaard 2018). One phenomenon that 
intuitively leads to degraded perceptual A- and P-consciousness is reduced 
signal acuity with respect to stimulus features, such as shape and texture. 
Reduced signal acuity (or resolution) can occur as a result of suboptimal 
viewing conditions, physiological, and neurophysiological abnormalities in 
the visual system, or less than full allocation of cognitive resources (e.g., 
attention).! For example, in the street scene in Figure 14.1, the reduced 
visibility and the reduced availability of exact spatial information makes it 
difficult to identify the objects making up the scene. However, scene context 
can aid identification. For example, the “car” on the left is identical to the 
“pedestrian” on the right after 90-degrees rotation (Oliva and Torralba 
2007). Yet, we are able to recognize these objects by relying on the scene 
context. 

Reduced attention can also intuitively lead to degraded perceptual 
consciousness (Brogaard 2015).? For example, when focusing your attention 
on the fixation point between the two Gabor patches in Figure 14.2, the two 
gradients appear to have different spatial resolutions (or texture), but when 
you covertly attend (without moving your eyes) to the left patch, the two 
gradients appear to have the same spatial resolution. 

In line with these intuitive examples of degraded consciousness, 
experimental approaches to test whether consciousness is graded have aimed 
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FIGURE 14.1 Reduced spatial acuity. In this image, the “pedestrian” on the right 
is identical to the “car” on the left after 90 degrees rotation, making 
scene context essential to object identification. 

Source: From Oliva and Torralba 2007. 


FIGURE 14.2 If you attend to the fixation point, the two patches appear to have 
different spatial resolutions compared to if you fixate to the fixation 
point but covertly attend to the left patch. 


Source: From Carrasco et al. 2004. 


to modulate A-consciousness, combined with a measure of P-consciousness, 
by manipulating either the visual input (e.g., backward masking, low contrast) 
or the allocation of cognitive resources (e.g., attentional blink; AB; Raymond 
et al. 1992). 

The gold standard for determining whether P-consciousness is graded 
is the perceptual awareness scale (PAS) (Ramsøy and Overgaard 2004; cf. 
Overgaard and Sorensen 2004, Overgaard et al. 2006, combining PAS with 
introspection). In PAS, participants are first presented with a series of visual 
stimuli and then asked to evaluate the subjective visibility or visual clarity of 
their perceptual experiences on a four-point scale: “clear image,” “almost clear 
image,” “weak glimpse,” and “not seen.” 

Finally, some studies have used electrophysiological (EEG) recordings 
to measure whether graded modulations of A-consciousness result in a 
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corresponding modulation of event-related brain potentials (ERPs) that have 
been proposed as markers of A- or P-consciousness (e.g., Tagliabue et al. 
2016). Proposed ERP markers of A- or P-consciousness include P1, P3, N1, 
N2, and N3 (Koivisto and Revonsuo 2003; Pins and ffytche 2003; Wilenius 
and Revonsuo 2007; Koivisto et al. 2008). In a comprehensive review of 
EEG studies of visual consciousness, however, Koivisto and Revonsuo (2010) 
found that the most reliable and most consistently observed ERP marker of 
visual consciousness is the “visual awareness negativity” (VAN) effect, an 
early-to-late negative wave deflection at posterior or anterior recording sites, 
with peak latency around 200-450 ms after stimulus onset, thus overlapping 
N1-N2 (Figure 14.3). The VAN effect is usually followed by a long-lasting 
P3 or late positive (LP) effect over the parietal lobes around 400 ms after 
stimulus onset. The correlation with visual consciousness of both VAN and 
LP has led some authors to suggest that VAN is an electrophysiological 
correlate of P-consciousness, whereas LP is a correlate of A-consciousness 
(Tagliabue et al. 2016). However, several studies have observed LP only for 
task-relevant/reported stimuli, suggesting that this component may be marker 
of post-perceptual processing rather than A-consciousness (e.g., Koivisto 
and Revonsuo 2007; 2010; Pitts et al. 2012; Koch et al. 2016; Cohen et al. 
2020). For example, LP has been found to co-vary with subjects’ confidence 
in their perceptual judgements (Koivisto and Revonsuo 2010). Likewise, early 
positive components like the P1 are not markers of consciousness, as physical 
properties of the stimulus (e.g., luminance contrast) can elicit P1, regardless 
of whether it is encoded in VWM. 

A large body of experiments has provided support for the hypothesis that 
consciousness is graded. The studies manipulating the visibility of the stimulus 
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FIGURE 14.3 The typical scalp distribution of the VAN effect, the most consistent 
marker of visual consciousness, and the LP effect, a likely marker of 
post-perceptual processing. 


Source: From Koivisto and Revonsuo 2010. 
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using backward masking, combined with a visibility measure, have shown 
that mean visibility ratings follow either a gradable pattern (e.g., Ramsøy and 
Overgaard 2004; Sergent and Dehaene 2004) or else a gradable pattern for 
low-level tasks (e.g., “red or blue?”) and a dichotomous pattern for high-level 
tasks (e.g., “smaller or larger than 5?”) (Windey et al. 2013; cf. Windey et al. 
2014). The finding of a dichotomous pattern of mean visibility ratings in high- 
level conditions (e.g., “smaller or larger than 5?”) does not provide evidence 
against the hypothesis that perceptual consciousness is graded, as solving high- 
level tasks requires post-perceptual processing. 

Attentional Blink (AB) experiments, which manipulate the allocation of 
cognitive resources, have yielded less consistent results. In the AB paradigm 
developed by Raymond et al. (1992), participants were asked to first identify 
the only white letter (T1) and then the letter X (T2) in a series of letters 
presented in rapid succession at a rate of 10 items per second (Figure 14.4). At 
the end of the presentation, participants were then asked to indicate whether 
they saw the two targets T1 and T2. T2 occurred in 50 per cent of the trials 
within 100-800 ms following T1 (lag 1 to lag 8). Raymond et al. (1992) 
found that at short T2-T1 intervals, participants who reported T1 correctly 
would tend to report T2 incorrectly (with the exception of the lag 1 sparing 
effect). 


FIGURE 14.4 Illustration of the rapid serial visual presentation used in the 
attentional blink paradigm developed by Raymond et al. (1992). 
Target 1 (T1) is the white letter and target 2 (T2) is whether the 
letter “X” was present or not. 
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While the mechanism underlying attentional blink is not fully understood, 
the prevailing explanation, originally suggested by Raymond et al. (1992), is 
that attention to T1 interferes with the perception of T2. They referred to this 
as an “attentional blink.” To rule out that the findings were due to the targets 
and distractors masking each other, they conducted a control experiment, 
where participants were only asked to identify the letter X (T2). Here, they 
did not observe any reporting inaccuracy, which suggests that the attentional 
blink was due to attention to T1 rather than masking. It might also be thought 
that participants reported T2 inaccurately at short T2-T1 intervals, because 
they failed to retain both targets in VWM until the end of the trial. However, 
this explanation does not align with what we know about the capacity and 
duration of VWM (e.g., Dall et al. 2021). One problem with Raymond et al.’s 
(1992) explanation of AB is that it doesn’t explain lag-1 sparing—that is, 
the absence of AB during lag 1. An alternative explanation that does explain 
lag-1 sparing is that the processing of T1 locks the attentional resources, 
making them unavailable for T2 processing. The reason we see lag-1 sparing 
is that during lag 1 the processing of Tl has not yet locked the attentional 
resources. Once the processing of T1 has locked the attentional resources, the 
performance with respect to T2 deteriorates. As the attentional resources are 
gradually released during later lags, however, performance with respect to T2 
slowly improves. 

The AB paradigm has become a popular way of testing whether visual 
consciousness is graded. In a recent study, Eiserbeck et al. (2022) used a 
variation on the AB paradigm, combined with a four-scale PAS-like response- 
option and EEG measures. In each trial, 13 images were shown in rapid 
succession, each for 107 ms (Figure 14.5). The images consisted of two targets 
(T1, T2) and 11 distractors. Tl was a dog in half of the trials and a muffin 
in the others and was presented as either the third (long lag) or the seventh 
item (short lag). T2 was a face and was presented as the 10th stimulus in all 
but 3 percent of trials, where it was replaced with a distractor. Prior to each 
trial, participants were asked to look for either a dog or a muffin, and a face, 
which would not always be present. After each trial, participants indicated via a 
response key (a) whether they saw a dog or muffin (dog/muffin/don’t know); 
(b) whether they saw a male or female face (male/female/don’t know); and 
(c) how clear their subjective impression of the face was on the four-point 
visibility scale (1=not seen, 2=slight impression, 3=strong impression, 4=seen 
completely), very similar to the PAS. During each trial, EEG brainwave activity 
was recorded from 62 scalp sites. 

The results revealed that in the short lag-T2-present trials, accuracy 
increased and RTs decreased in the objective gender-classification task with 
higher visibility ratings (Figure 14.6). 

The electrophysiological recordings showed graded ERP-modulations 
corresponding to the different visibility ratings in the N1, N2, and P3 
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1: Dog or Muffin? (T1) 
Dog / Muffin / | don't know 
2: Male or female face? (T2) 
Male / Female / | don't know 


3: Visibility of the face (T2) 
Not seen (1) 
/ Slight impression (2) 


/ Strong impression (3) 
——. = = Er / Seen completely (4) 
m Sekme T2 absent 
each picture: = condition: 
107 ms d distractor instead of 
(no ISI) 9 i 
T2 (17 % of trials) 


FIGURE 14.5 Structure of long- and short-lag trials. After each trial, participants 
answered three questions regarding Tl-identity (dog/muffin/dop’t 
know), T2-gender, and T2-visibility. 

Source: From Eiserbeck et al. 2022. 


Performance per subjective visibility level (short lag, T2 present) 
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FIGURE 14.6 Accuracy and reaction time in the gender classification task as a 
function of visibility ratings in the short-lag-T2-present trials. The 
mean accuracy for “not seen” is very low, because participants were 
instructed to choose “I don’t know” rather than 1=”not seen” if 
uncertain about the face’s gender. 


Source: From Eiserbeck et al. 2022. 


components, but not in the early P1 component, confirming previous findings 
that P1 is not a marker of visual consciousness (Figure 14.7). 

In addition, Sørensen and colleagues (2014) combined a VWM resolution 
paradigm with Landolt rings (Wilken and Ma 2004; see Figure 14.8, and 
also Figure 14.9) with PAS report to investigate the relationship between P- 
consciousness and VWM. 

The findings pointed to systematic set-size effects within the four PAS 
categories. That is, even though participants report the same degree or 
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FIGURE 14.7 (a) Mean activityin P1, N1, N2, and P3 following T2-onset relative to 
the T2-absent condition for the different PAS-like ratings. (b) Mean 
wave amplitudes for P1, N1, N2, and P3 as a function of visibility 
ratings relative to the T2-absent condition. 


Source: From Eiserbeck et al. 2022. 


clearness of experience, their response performance seemed to be modulated 
by the set-size of the individual trial types. So, although participants report 
the same clarity of content, then increasing set-size of the memory array 
systematically increases both guessing rate and decreases the resolution of 
retention (Sgrensen et al. 2014). Using Block’s (1995) terminology, these 
results thus indicate that there may be varying degrees of access to the same 
phenomenological content, depending on the set-size of the Landolt rings 
presented in the experiment. Taken together, these and Eiserbeck et al‘s 
(2022) finding suggests that consciousness is graded. 

However, a couple of older AB experiments have pointed to consciousness 
as dichotomous. Sergent and Dehaene (2004) conducted an AB experiment 
using a continuous visibility scale in single- and dual-task conditions. T1 was 
either “XOOX” or “OXXO,” and T—which could be present or absent— 
was one of the French number words, “DEUX,” “CINQ,” “SEPT, or 
“HUIT.” Tl and T2 were embedded among random strings of uppercase 
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Fixation (500 ms) 


Stimulus (140 ms) 


Masks (500 ms) 


PAS (wait) 


Probe (wait) 


Feedback (wait) 


FIGURE 14.8 VWM.-sets of one, two, or four Landolt rings are presented in six 
possible placeholders, followed by a mask on all possible positions. 
Then five of the six placeholders are removed, and the participants 
are asked to report their awareness of the target in this position using 
PAS. They are then presented with a wireframe probe, which they 
are asked to orient to match the shape of the target in this position. 
Source: From Sorensen et al. 2014. 


consonants generated from all consonants except Q, T, and X. In the single- 
task condition, native French speakers were asked to rate the visibility of 
T2 by moving a slider on a continuous visibility scale labelled “not seen” 
at the left and “maximal visibility” at the right. In the dual-task condition, 
participants were subsequently asked to identify the two middle letters of 
T1 (“OO” or “XX”). The results did not reveal any significant differences 
in visibility between the different lags in the single-task condition. In the 
dual-task condition, mean visibility ratings followed a dichotomous pattern, 
with statistically significant correlations between the shortest and longest 
lags and “maximal visibility” ratings and between attentional blink lags 
and “not seen” ratings. In a second AB experiment, Sergent and Dehaene 
manipulated the duration of T2 across different trials. Here, they found that 
the visibility ratings of T2 followed a dichotomous pattern within each lag 
of the attentional blink. Their results thus point to a notion of consciousness 
as dichotomous. 
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T2 Response T2 Response Feedback T1 Response 
Instruction Wheel (500 ms) Instruction 


Color of FIRST square: 
Error: 

SECOND black or 

square? goena white? 
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Fixation 
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FIGURE 14.9 The colour AB task. (a) Tl and T2 were embedded in a rapid serial 
visual presentation stream of coloured circles. (b) Subjects first 
reported the colour of T2 using a colour wheel and then whether T1 
was black or white. 


Source: From Asplund et al. 2014. 


However, these results are questionable. To rule out that participants used a 
dichotomous response criterion in their visibility ratings, Sergent and Dehaene 
conducted a backward masking study, again using the French number words, 
“DEUX,” “CINQ,” “SEPT, or “HUIT, presented at different stimulus 
durations across different trials. After a presentation of the mask for 300 
ms, participants rated the visibility of the masked words on the continuous 
visibility scale. Here, mean visibility ratings were found to gradually increase 
with increasing stimulus durations, which they took to rule out a response bias: 


subjects reported seeing increasingly more detailed aspects of the masked 
stimuli—from a few features to single letters, graphemes, and finally the 
whole word—and they traduced this increasing detail by continuously 
varying the cursor on the visibility scale. 

(Sergent and Dehaene 2004, 727) 


However, pace Sergent and Dehaene, these findings do not rule out a response 
bias. In the backward masking experiment, the masked stimuli were all T2- 
letter words. In the AB experiment, by contrast, a T2-letter word was only 
present in some trials. In T2-present trials, subjects who had only a slight 
impression of T2 may not have been able to tell whether T2 was a word or a 
distractor (e.g., “CINQ” vs. “CVNG”), which would mean that they would 
have responded with “not seen,” as “don’t know” was not a response option. 


Template tuning and graded consciousness 261 
Lag 2 Lag 8 


P,=0.719 
Pa = 0.538 o = 20.4 


o = 20.6 


Frequency 
Frequency 


-180 -90 0 90 180 -180 -90 0 90 180 
Response Error (°) Response Error (°) 


FIGURE 14.10 Distribution of response errors for T2 in the colour AB 
task, aggregated across all subjects separately for lags 2 and 
8. P, = probability of T2 encoding; o = precision of T2 encoding. 


Source: From Asplund et al. 2014. 


Moreover, because distractors did not contain vowels or the consonants 
Q, T, or X, the participants may have been able to identify T2-number words 
like “DEUX,” “CINQ,” or “HUIT” from “EUX,” “INQ,” or “UIT” in T2- 
present trials, which would have led to a “seen”-rating. Yet “seen”-ratings 
would have been the result of a judgement about the stimulus’ identity, not a 
perception of it. As we cannot rule out response bias, this could be a potential 
confound of Sergent and Dehaene’s study. 

In a subsequent AB experiment, Asplund et al. (2014) looked at the 
precision of T2-identifications at different T2-T] lags in a colour and a face 
identification task. In the colour identification task, T1 was either a black 
or a white square, and T2 was a coloured square (Figure 14.9). Participants 
watched a rapid serial presentation of a black/white square (T1) and a coloured 
square (T2) embedded among coloured circles (distractors) at either a short 
lag (200ms) or a long lag (800ms) and a T2 duration of either 100 or 200 
ms. Participants were then asked to identify T2 on a continuous colour wheel. 
Subjects received immediate feedback on response error and were then asked 
whether T1 was black or white. The face identification task was analogous with 
the T2-identification given by moving a cursor on a continuous face wheel. 

The response error was given by the distance between the reported and 
the correct value on the continuous response wheel, and the precision of a 
participant’s responses was given by the standard deviation of the distribution 
of their response errors from the correct value. The smaller the standard 
derivation, the greater the precision. The findings revealed the same T2 
precision at shorter and longer T2-T1 lags (Figure 14.10). 
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Asplund et al. take this to support the hypothesis that consciousness is 
dichotomous, reasoning as follows. If consciousness is dichotomous, visual 
information is either encoded in VWM or not. So, with a longer T2-T] lag 
and less attention allocated to T1, the probability that T2 is represented 
in VWM should increase, but the precision of T2 identifications should 
remain constant. Conversely, if consciousness is graded, a longer T2-T1 
lag should correspond to an increase in the precision of T2-responses. As 
they found that the precision of T2-responses remained constant at shorter 
and longer T2-T1 lag, they take their findings to point to a dichotomous 
conception of consciousness. However, this interpretation rests on the 
assumption that if consciousness is graded, the increase in attentional 
resources with longer T2-T1 lags should lead to greater precision. But 
degree of precision does not by itself have any bearing on the question of 
whether consciousness is graded. Imprecision is just a form of inaccuracy 
(or non-veridicality). Yet an inaccurate perceptual representation can be 
subjectively indistinguishable from its veridical counterpart. So, variation 
in precision does not reflect a gradability of conscious accessibility or visual 
clarity (PAS rating) (Sorensen et al. 2014; cf. Pincham et al. 2016), but 
may just reflect an increase in cognitive load or attentional resources, or 
misleading external cues. 

The available evidence thus suggests that consciousness is graded. But 
that presents a problem for so-called race models of perception, as these are 
committed to a dichotomous conception of visual consciousness. 


14.3 Race/biased choice-models of perception 


Perceptual information is collected via the senses and is processed by the brain. 
In the case of visual perception, reflected light stimulates cells in the retina, 
which translates the information carried by the light into a neural signal, 
travelling through the thalamus towards the primary visual cortex and then 
extrastriatal areas (V4-V5/MT). While this bottom-up process is modulated 
by backward projections in the visual stream, it is commonplace to think 
of the visual process as a linear progression from lower to higher cognitive 
subsystems (e.g., Gazzaniga et al. 2018). According to an influential model 
of perceptual processing originally advanced by Atkinson and Shiffrin (1968), 
external input is transferred to a sensory register and from here onto a short- 
term working memory store, from which inputs are selected for encoding 
and representation in long-term memory. Atkinson and Shiffrin (1968) even 
speculate that the progression of information potentials could be transferred 
directly from the sensory register and into long-term memory without any 
representation in short-term memory. The key insight of this model is that 
insofar as information is represented in short-term memory, this happens prior 
to its encoding and representation in long-term memory. 
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On an alternative proposal, initial attentional selection for representation 
in VWM partly depends on the sensory evidence that an object belongs to a 
certain object category, and the sensory evidence in turn is related to how well 
the sensory information matches template categories in long-term memory. 
This theoretical suggestion forms the basis of Bundesen’s “A Theory of Visual 
Attention” (TVA), which is a model of attention that combines elements 
of race models of perception with elements of biased-choice models (e.g., 
Bundesen 1990; Bundesen et al. 2005; Bundesen and Habekost 2008). Race 
models of perception hold that what is represented in VWM is the result of a 
stochastic race between the result of matching a given visual signal to all object 
and feature categories stored in long-term memory. Biased-choice models of 
perception hold that what is represented in VWM is biased by payoff history 
(e.g., Luce 1963). 

A combined race/biased-choice model like TVA is supported by a 
variety of studies demonstrating that familiarity and expertise with specific 
categories fundamentally affect the processing and representation of sensory 
information in working memory. Sørensen and Kyllingsbek (2012) showed 
that as expertise with letters increased in different age groups, short- 
term memory capacity also increased while performance in non-trained 
categories (line drawings) remained stable. This pattern was later replicated 
in a slightly different paradigm investigating short-term memory capacity 
for letters, line drawings, and Japanese hiragana symbols (Dall et al. 2016). 
Measuring working-memory capacity in three groups of university students, 
Dall and colleagues (2016) demonstrated that the variation in expertise 
with hiragana drives how much information participants can retain in short- 
term memory. Both control conditions (letters and line drawings) were 
unaffected across the three expertise groups, as predicted. These findings 
have received additional support from studies demonstrating that memory 
capacity is higher for cartoons that are known than cartoons that are 
similar but not known (Xie and Zhang 2017) and higher for real flags that 
participants are familiar with than pseudo flags that are unfamiliar to the 
participants (Conci et al. 2021). In fact, the capacity of working memory 
seems to be driven solely by the degree of familiarity for expert participants 
independent of other factors, for example, simple versus complex objects. 
In a recent study, Dall et al. (2021) investigated how Chinese participants 
process Chinese characters. The stimulus was manipulated along two 
dimensions: physical and perceived complexity. Physical complexity was 
defined by the stroke count of the characters and the perceived complexity 
by the word frequency of the character (e.g., the character for “mountain” 
is more frequently used than that for “embroidery”), which enabled us to 
analyse high and low complexity over the four categories (viz., high perceived 
and high physical complexity, low perceived and high physical complexity, 
low perceived and low physical complexity, and high perceived and low 


264 Berit Brogaard and Thomas Alrik Sørensen 


physical complexity). Dall et al. reported that for the Chinese participants 
who were considered to be experts in reading Chinese, VWM capacity was 
driven solely by word frequency or perceived complexity, independently of 
stroke count. Processing speed was found to follow a similar pattern, with 
increased processing speed for familiar objects. By contrast, the threshold 
for perception was unaffected by complexity, both perceived and physical 
(Dall et al. 2021). These results demonstrate that expertise, or strength 
of category templates in long-term memory, has a significant impact on 
processing speed and accuracy of representation in VWM, which in turn 
suggests a reversal of the relationship between the role of short- and long- 
term memory in perceptual processing. 


14.4 The template tuning theory of perception 


In previous works, we have developed a model of visual perception, which 
we call the template tuning theory (TTT; Brogaard and Sørensen 2023, 
in press a, b). The theory was proposed as a theoretical model expanding 
on the basic premise that encoding relies on matching sensory information 
with mental templates in long-term memory (cf., Bundesen 1990). These 
templates can be honed with expertise to enhance categorization of objects 
or scenes belonging to categories that perceivers are more familiar with 
(e.g., Sorensen and Kyllingsbek 2012; Dall et al. 2021). In line with 
Bundesen (1990), TTT posits that the perceptual mechanism can be driven 
by a stimulus bias (also called filtering) or a categorical bias (also called 
pigeonholing).* In the case of stimulus bias, perceptual processing is preceded 
by an attentional weighting of the incoming visual signals from across the 
entire visual field. If, say, a red dot against a green background captures your 
attention, then signals from the red dot are weighed higher than signals from 
the green background. In the case of categorical bias, perceptual processing 
is preceded by a strategic prioritization of a template, which increases the 
likelihood that a categorization is made if an incoming visual signal matches 
the template, as in a case where your house is burning, and you are searching 
for a fire extinguisher. 

Assuming the visual signals from a given stimulus in the visual field are 
weighted higher than signals from other items in the visual field (i.e., stimulus 
bias), TTT stipulates that the perceptual process begins with the brain- 
extracting object or scene gists from the prioritized visual signals in the 
early visual system (Figure 14.11) (Brogaard and Sørensen 2023, in press a; 
b). Object and scene gists convey coarse-grained information about object 
contours, object surface patterns, global scene layout, and statistical scene 
regularities (e.g., printers are frequently found in offices) (Schyns and Oliva 
1994; Bar 2004; Auckland et al. 2007; Oliva and Torralba 2007; V6 and 
Wolfe 2015; V6 et al. 2019). 
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FIGURE 14.11 Gist perception at reduced speed. 


Source: From KSU, Vision Cognition Laboratory. 


This process occurs at a very rapid pace (Lowe et al. 2018) and once 
extracted, object and scene gists are rapidly projected to late stages of the 
visual ventral stream (e.g., Kveraga et al. 2007). Here, they activate templates 
in long-term memory corresponding to singleton or generic perceptual 
categories (e.g., the class containing a familiar person’s face or the class 
of square objects) (Brogaard and Sørensen 2023, in press a; b) The visual 
input is also processed more slowly in a partial bottom-up fashion in the 
early visual ventral stream by well-defined low-level visual processes, such 
as double-opponent processes (cf. Bar et al. 2006; Torralba et al. 2006). 
After undergoing early pre-conscious perceptual processing, the partially 
processed visual signal is then matched with the activated object or scene 
templates in long-term memory until the best match has been identified 
(categorization). The categorization of the visual signal coincides with its 
selection for and representation in VWM, which makes the information 
consciously available for post-perceptual tasks (e.g., reporting, reflection, 
or decision-making). The diagram in Figure 14.12 illustrates the key 
components of TTT.* 
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FIGURE 14.12 A diagrammatic representation of the TTT model, describing the 
relationship between cognitive systems and processes involved 
in selection and representation of active perceptual information 
in VWM. 


Source: Created by the Authors. 


14.5 Template tuning theory and graded consciousness 


The attentional model that lent inspiration to TTT, as originally conceived, 
is committed to a dichotomous conception of visual consciousness: Once a 
categorization of a visual signal is made, the information is fully consciously 
accessible to the perceiver, which is to say that visual consciousness is 
dichotomous (Bundesen and Habekost 2008 ). Even ambiguous categorizations 
of a visual signal (say in the instance of bistable figures like the Necker cube, 
Figure 14.13) will result in representations in working memory that are 
fully consciously accessible to the perceiver. Of course, the representation in 
working memory may be re-encoded or reinterpreted (if, e.g., the perceiver 
experiences a shift in the surface of the Necker cube), but the re-encoded or 
reinterpreted representation will nonetheless still depend on a categorization 
of the visual signal. 

If the attentional model that lent inspiration to TTT is committed to a 
dichotomous conception of visual consciousness, the question arises whether 
TTT is similarly committed. Of course, whether visual consciousness is graded 
is ultimately an empirical question. But as we have seen, the available evidence 
suggests that visual consciousness is graded to some degree (e.g., Eiserbeck 
et al. 2022; Overgaard and Sørensen 2004; Ramsøy and Overgaard 2004; 
Sørensen et al. 2014). This then gives us reason to develop a version of TTT 
that can accommodate these findings. It may perhaps be thought that reduced 
visibility and availability of perceptual information stems from a lack of suitable 
templates due to lack of familiarity or expertise. This proposal will not do, 
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FIGURE 14.13 The Necker Cube is a bistable illusion whereby observers typically 
see the figure shift in which side of the cube is towards the back and 
which is towards the front. 


Source: Created by Authors. 


however, as our cognitive system can deploy multiple templates to determine 
the category of the object or feature. Say you see the Japanese Kanji “7X” for 
the first time. Even if you do not have a dedicated template representation, 
you may be able to combine more basic shape templates for the categorization 
of the Japanese symbol, for instance, “A” and “ft.” Although such “makeshift” 
templates may slow you down on cognitive tasks, this clearly should not reduce 
your access to—or phenomenal awareness of—the perceptual information 
encoded in VWM. As our cognitive system can deploy a complex of connected 
templates to process new visual information, poor templates are not necessarily 
correlated with degraded perceptual consciousness. 

However, degraded consciousness may be the result of poor template 
matching. One option here is that an impoverished visual signal prevents an 
exact match between the signal and one of the activated templates. Another 
possibility is that the activated templates are suboptimal due to an impoverished 
object gist, which might also prevent an exact match between the signal and 
one of the activated templates. We can refer to inexact matches between a signal 
and an activated template as “prediction errors” (or “categorization errors”; 
cf. Brogaard and Sørensen, im press a). If an inexact categorization is made, the 
perceptual information encoded in VWM will inevitably be sparser, which may 
result in degraded visibility and availability of perceptual information. If this 
suggestion is on the right track, then the size of the prediction error should 
be directly correlated with the degree of degradation, at least up to the point 
where no categorization is made. While models like TVA assume that sensory 
evidence is matched with all possible categories in memory, TTT makes the 
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assumption that there is a guided template matching procedure, shaped by 
context, expectations, and gist. This stage pre-selects or limits the subset of 
potential category matches to be made. 

Despite being based on race/biased-choice models of perception, 
which operate with a dichotomous conception of visual consciousness, a 
modified version of TTT can thus explain the empirical data pointing to a 
graded conception. To see this, consider Eiserbeck et al.‘s (2022) finding 
that attention to Tl during short T2-T1 lags attenuates the visibility and 
availability of T2 information. TTT can explain this finding either in terms 
of reduced signal quality (hereby increasing the prediction error) or reduced 
gist quality (widening the subset of potential categories, and thus also 
increasing the prediction error). Attenuated attention to T2 during AB might 
have impaired the quality of the visual signal, thus making it less likely that 
the signal will exactly match one of the activated templates. Alternatively, 
diminished attention to T2 during AB might have impaired the quality of the 
gist information, which would also reduce the likelihood that the signal would 
exactly match one of the activated templates. 

These alternatives are, of course, not mutually exclusive. But whereas an 
inexact match due to an impoverished signal may explain graded consciousness 
in masking/low contrast experiments, an inexact match due to an impoverished 
T2 gist may at least partly explain the findings in AB experiments. Indeed, 
Eiserbeck et al. (2022) found graded ERP-modulations corresponding to the 
different visibility ratings in the N1, N2, and P3 components. P3 may reflect 
VWM encoding and post-perceptual VWM tasks, but N1 and N2 are widely 
regarded as indicators of perceptual processing prior to encoding in VWM. The 
graded pattern found in the N1 component may reflect disruptions in early 
perceptual processing as a result of attenuated attention to T2 during the short 
T2-T1 lags. Diminished attentional engagement with T2 may have interfered 
with the extraction of T2-gist information, leading to an impoverished T2 
gist and the activation of face templates lacking some, though not all, gender 
cues. This, in turn, would explain why accuracy decreased and RTs increased 
in the gender-identification task with lower visibility ratings in the short lag- 
T2-present trials. 

While TTT in principle is compatible with both a graded and a dichotomous 
conception of consciousness, the empirical evidence presented above seems to 
favour the former. 
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Notes 


l Something similar can occur as a result of eye lens abnormalities, which can 
reduce the acuity of the visual signal with respect to spatial features (e.g., form) 
of stimuli that are far away (myopia, or nearsightedness) or close up (hyperopia or 
farsightedness). 

2 Different types of attention might affect access in different ways. For instance, if an 
array of stimuli (or a natural scene) is globally and diffusely attended, there may be 
decreased access to information about the elements making up the array; cf. Lopez 
(2020). 

3 “Or” is here to be read as the inclusive “or.” We leave open the question of whether 
perceptual processing always involves both stimulus bias and categorical bias. 

4 The term “iconic memory” originates in Sperling (1960), who found evidence 
of the existence of a transient, yet high-capacity, visual memory store. In one 
condition, participants were briefly presented (50 ms) with an array of consonants 
(3 rows and 4 columns) and asked to report as many consonants as possible. In this 
condition, Sperling found that the participants were able to report an average of 4.4 
consonants. In a second condition, an individual row was cued immediately after the 
presentation of the consonants. In this condition, participants were able to report 
3.3 consonants in the cued row. Sperling took this to suggest that participants were 
storing nearly all the consonants in a way that allowed them to attend to the cued 
row and encode the cued consonants in VWM after the presentation. When the cue 
was delayed 1s, however, the volunteers were only able to recall an average of 1.5 
consonants from the cued row, suggesting that the iconic memory representation 
of the visual array had decayed. Block (1995) has interpreted these findings as 
evidence of P-consciousness in the absence of A-consciousness. However, another 
explanation is that in the first condition only the scene (i.e., the array of the 12 
consonants) was represented in VWM as a single diffusely attended item, whereas 
in the second condition the cue was able to cue a specific row in the scene gist, 
allowing for three of the consonants in the row to become represented in VWM as 
three distinct items. 
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